ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters
نویسندگان
چکیده
The ClusterWatch middleware provides runtime flexibility in what system-level metrics are monitored, how frequently such monitoring is done, and how metrics are combined to obtain reliable information about the current behavior of GPGPU clusters. Interesting attributes of ClusterWatch are (1) the ease with which different metrics can be added to the system—by simply deploying additional “cluster spies,” (2) the ability to filter and process monitoring metrics at their sources, to reduce data movement overhead, (3) flexibility in the rate at which monitoring is done, (4) efficient movement of monitoring data into backend stores for long-term or historical analysis, and most importantly, (5) specific support for monitoring the behavior and use of the GPGPUs used by applications. This paper presents our initial experiences with using ClusterWatch to assess the performance behavior of the a larger-scale GPGPU-based simulation code. We report the overheads seen when using ClusterWatch, the experimental results obtained for the simulation, and the manner in which ClusterWatch will interact with infrastructures for detailed program performance monitoring and profiling such as TAU or Lynx. Experiments conducted on the NICS Keeneland Initial Delivery System (KIDS), with up to 64 nodes, demonstrate low monitoring overheads for high fidelity assessments of the simulation’s performance behavior, for both its CPU and GPU components.
منابع مشابه
Low Cost UAV-based Remote Sensing for Autonomous Wildlife Monitoring
In recent years, developments in unmanned aerial vehicles, lightweight on-board computers, and low-cost thermal imaging sensors offer a new opportunity for wildlife monitoring. In contrast with traditional methods now surveying endangered species to obtain population and location has become more cost-effective and least time-consuming. In this paper, a low-cost UAV-based remote sensing platform...
متن کاملGeneral-purpose computation on GPUs for high performance cloud computing
Cloud computing is offering new approaches for High Performance Computing (HPC) as it provides dynamically scalable resources as a service over the Internet. In addition, General-Purpose computation on Graphical Processing Units (GPGPU) has gained much attention from scientific computing in multiple domains, thus becoming an important programming model in HPC. Compute Unified Device Architectur...
متن کاملUSTRA: High-Performance Data Management of Ubiquitous Urban Sensing Trajectories on GPGPUs
Volumes of GPS recorded trajectory data in ubiquitous urban sensing applications are increasing fast. Many trajectory queries are both I/O and computing intensive. In this study, we propose to develop the USTRA prototype system to efficiently manage large-scale GPS trajectory data using General Purpose computing on Graphics Processing Units (GPGPU) technologies. Towards this end, we have develo...
متن کاملA Flexible and Adaptive Transport System Architecture to Support Lightweight Protocols for Multimedia Applications on High-Performance Networks
Transport systems integrate operating system services such as memory and process management together with communication protocols that utilize these OS services to support distributed applications running on local and wide area networks. Existing transport systems do not customize their services to meet the quality-of-service requirements of distributed applications. This often forces developer...
متن کاملSecurity Analysis of Lightweight Authentication Scheme with Key Agreement using Wireless Sensor Network for Agricultural Monitoring System
Wireless sensor networks have many applications in the real world and have been developed in various environments. But the limitations of these networks, including the limitations on the energy and processing power of the sensors, have posed many challenges to researchers. One of the major challenges is the security of these networks, and in particular the issue of authentication in the wireles...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013